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ABSTRACT OF THE DISCLOSURE 

A novel seed coat specific peroxidase genomic sequence is characterized and 
presented. Adjacent DNA regulatory regions have also been characterized. The seed 
coat peroxidase is translated as a 352 amino acid precursor protein of 38 kDa 
comprising a 26 amino acid signal sequence which when cleaved results in a 35 kDa 
protein. Plants containing a dominant Ep allele accumulate large amounts of 
peroxidase in the hourglass cells of the subepidermis. Homozygous recessive epep 
genotypes do not accumulate peroxidase in the hourglass cells and are much reduced 
in total seed coat peroxidase activity. Probes derived frcm the cDNA, or gen .mic 
DNA can be used to detect polymorphisms that distinguished Eptp and epep 
genotypes. Cosegregation of the polymorphisms in an F : population from a cross of 
EpEp and epep plants shows that the Ep locus encodes the seed coat peroxidase 
protein. Comparison of Ep and ep alleles indicates that the recessive gene lacks 87 
bp of sequence encompassing the translation start codon. The heterologous 
expression, as well as vectors and hosts to be used for the expression of the seed 
coat peroxidase, are also disclosed. The seed-specific DNA regulatory region may 
be used to control expression of genes of interest such as i) genes encoding 
herbicide resistance, or ii) biological control of insects or pathogens (c,g, B. 
thunngiensis), or iii) viral coat proteins to protect against viral infections, or iv) 
proteins of commercial interest (e.g. pharmaceutical), and v) proteins that alter the 
nutritive value, taste, or processing of seeds. 
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Seed coat DNa regulatory region and peroxidase 
The present invention relates to a novel DNA molecule comprising a plant 
seed coat specific DNA regulatory region and a novel structural gene encoding a 
peroxidase. The seed-coat specific DNA regulatory region may also be used to 
control the expression of other genes of interest within the seed coat. 

BACKGROUND OF THE INVENTION 



Full citations for references appear at the end of the Examples section. 

15 Peroxidases are enzymes catalyzing oxidative reactions that use H,0, as an 

electron acceptor. These enzymes are widespread and occur ubiquitously in plants 
as isozymes that may be distinguished by their isoelectric points. Plant peroxidases 
contribute to the structural integrity of cell walls by functioning in lignin 
biosynthesis and suberization, and by forming covalent cross-linkages between 

20 extensin, cellulose, pectin and other cell wall constituents (Campa. 1991). 
Peroxidases are also associated with plant defence responses and resistance to 
pathogens (Bowles, 1990; Moerschbacher 1992). Soybeans contain 3 anionic 
isozymes of peroxidase with a minimum M r of 37 kDa (Sessa and Anderson, 1981). 
Recently one peroxidase isozyme, localised within the seed coat of soybean, has 

25 been characterized with a M, of 37 kDa (Gillikin and Graham, 1991). 

In an analysis of soybean seeds. Buttery and Buzzell (1968) showed that the 
amount of peroxidase activity present in seed coats may vary substantially among 
different cuiiivars. The presence of a single dominant gene Ep causes a high seed 
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different cultivars. The presence of a single dominant gene Ep causes a high seed 
coai peroxidase phenotype (Buzzell and Buttery, 1969). Homozygous recessive epep 
plants are -100-fold lower in seed coat peroxidase activity. This results from a 
reduction in the amount of peroxidase enzyme present, primarily in the hourglass 

5 cells of the subepidermis (Gijzen et i993). In plants carrying the Ep gene, 
peroxidase is heavily concentrated in the hourglass cells (osteosciereids). These 
cells form a highly differentiated cell layer with thick, elongated secondary walls 
and large intercellular spaces (Baker et ai 9 1987). Hourglass cells develop between 
the epidermal macrosclereids and the underlying articulated parenchyma, and are a 

10 prominent feature of seed coat anatomy at full maturity. The cytoplasm exudes 
from the hourglass cells upon imbibition with water and a distinct peroxidase 
isozyme constitutes five to 10% of the total soluble protein in EpEp seed coats. It 
is not known why the hourglass cells accumulate large amounts of peroxidase, but 
the sheer abundance and relative purity of the enzyme in soybean seed coats is 

15 significant because peroxidases are versatile enzymes with many commercial and 
industrial applications. Studies of soybean seed coat peroxidase have shown this 
enzyme to have useful catalytic properties and a high degree of thermal stability 
even at extremes of pH (McEldoon et ai y 1995). These properties result in the 
preferred use of soybean peroxidase, over that of horseradish peroxidase, in 

20 diagnostic assays as an enzyme label for antigens, antibodies, oligonucleotide 
probes, and within staining techniques. Johnson et al report on the use of soybean 
peroxidase for the deinking of printed waste paper (U.S. 5,270,770; December 6, 
1994) and for the biocatalytic oxidation of primary alcohols (U.S. 5,391,488; 
February 13, 1996). Soybean peroxidase has also been used as a replacement for 
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chlorine in the pulp and paper industry, or as formaldehyde replacement (Freiberg, 
1995). 

An anionic soybean peroxidase from seed coats has been purified (Gillikin 
and Graham, 1991). This protein has a pi of 4.1 and M r of 37 kDa. A method for 
the bulk extraction of peroxidase from seed hulls of soybean using a freeze thaw 
technique has also been reported (U.S. 5,491,085, February 13, 1996, Pokara and 
Johnson). 

Lagrimini et al (1987) disclose the cloning of a ubiquitous anionic peroxidase 
in tobacco encoding a protein of M, of 36 kDa. This peroxidase has also been over 
expressed in transgenic tobacco plants (Lagrirnini et al 1990) and MaliyakaJ 
discloses the expression of this gene in cotton (WO 95/08914). 

Huangpu et al (1995) reported the partial cloning of a soybean anionic seed 
coat peroxidase. The 1031 bp sequence contained an open reading frame of 849 bp 
encoding a 283 amino acid protein with a Mr of 30.577. The M, of this peroxidase 
is 7 kDa less than what one would expect for a soybean seed coat peroxidase as 
reported by Gillikin and Graham (1991) and possibly represents another peroxidase 
isozyme within the seed coat. 

The upstream promoter sequences for two poplar peroxidases have been 
described by Osakabe et al (1995). A number of characteristic regulatory sites were 
identified from comparison of these sequences to existing promoter elements. 
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Additionally, a cryptic promoter with apparent specificity for seed coat tissues was 
isolated from tobacco by a promoter trapping strategy (Fobert et al. 1994). The 
upstream regulatory sequences associated with the Ep gene in soybean are distinct 
from these and other previously characterized promoters. The soybean Ep promoter 
5 drives high-level expression in a cell and tissue specific manner. The peroxidase 
protein encoded by the Ep gene accumulates in the seed coat tissues, especially in 
the hour glass cells of the subepidermis. Minimal expression of the gene is detected 
in root tissues. 

One problem arising from the desired use of soybean seed coat peroxidase 
is that there is variability between soybean varieties regarding peroxidase production 
(Buttery and Buzzell, 1986; Freiberg, 1995). Due to the commercial interest in the 
use of soybean seed coat peroxidase new methods of producing this enzyme arc 
required. Therefore, the gene responsible for the expression of the 37 kDa isozyme 
in soybean seed coat was isolated and characterized. 

Furthermore, novel regulatory regions obtained from the genomic DNA of 
soybean seed coat peroxidase have been isolated and characterized and are useful 
20 in directing the expression of genes of interest in seed coat tissues. 

SUMMARY OF THE INVENTION 

The present invention relates to a DNA molecule that encodes a soybean seed 
coat peroxidase and associated DNA regulatory regions. 
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This invention also embraces isolated DNA molecules having the nucleotide 
sequence of either SEQ ID NOl (the cDNA encoding soybean seed coat 
peroxidase) or SEQ ID No:2 (the genomic sequence). 



5 This invention also provides for a chimeric DNA molecule comprising a seed 

coat-specific regulatory region having nucleotides 1-191 of SEQ ID NO:2 and a 
gene of interest under control of this DNA regulatory region. Also included within 
this invention are chimeric DNA molecules comprising genomic DNA sequences 
exemplified by nucleotides 412-1041, 1234-2263 or 2430-2691 of SEQ ID NO:2. 

0 

The present invention also provides for vectors which comprise DNA 
molecules encoding soybean seed coat peroxidase. Such a construct may include 
the DNA regulatory region from SEQ ID NO:2 in conjunction with the seed .coat 
peroxidase gene, or the seed coat peroxidase gene under the control of any suitable 
'< constitutive or inducible promoter of interest 



This invention is also directed towards vectors which comprise a gene of 
interest placed under the control of a DNA reguJatory element derived from the 
genomic sequence encoding soybean seed coat peroxidase. Such a regulatory 
20 element includes nucleotides 1-191 of SEQ ID NO:2. Elements comprising 
nucleotides 412-1041. 1234-2263 or 2430-2691 of SEQ ID NO:2 may also be used. 
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This invention also embraces prokaryotic and eukaryotic cells comprising the 
vectors identified above. Such cells may include bacterial, insect, mammalian, and 
plant cell cultures. 

5 This invention also provides for transgenic plants comprising the seed coat 

peroxidase gene under control of constitutive or inducible promoters. Furthermore, 
this invention also relates to transgenic plants comprising the DNA regulatory 
regions of nucleotides 1-191 of SEQ ID NO:2 controlling a gene of interest, or 
comprising genes of interest in functional association with genomic DNA sequences 

10 exemplified by nucleotides 412-1041, 1234-2263 or 2430-2691 of SEQ ID NO:2. 

This Invention is also directed to a method for the production of soybean 
seed coat peroxidase in a host cell comprising: 

i) transforming the host cell with a vector comprising an 
15 oligonucleotide sequence that encodes soybean seed coat peroxidase; 

and 

ii) culturing the host cell under conditions to allow expression of the 
soybean seed coat peroxidase. 



20 This invention also provides for a process for producing a heterologous gene 

of interest within seed coats of a transformed plant, comprising propagating a plant 
transformed with a vector comprising a gene of interest under the control of 
nucleotides M91 of SEQ ID NO:2 
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Although the present invention is exemplified by a soybean seed coat 
peroxidase and adjacent DNA regulatory regions, in practice any gene of interest can 
be placed downstream from the DNA regulatory region for seed coat specific 
expression. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

These and other features of the invention will become more apparent from 
the following description in which reference is m^de to the appended drawings 
wherein 

5 .... 

Figure 1 is the cDNA and deduced amino acid sequence of soybean seed coat 
peroxidase. Nucleotides are numbered ^y assigning +1 to the first base of the 
ATG start codon; amino acids are numbered by assigning +1 to the N- 
terminal Gin residue after cleavage of the putative signal sequence. The N- 

10 termiaJ signal sequence, the region of the active site, and the heme-binding 

domain are underlined. The numerals 1, II and III placed directly above 
single nucleotide gaps in the sequence indicate the three intron splice 
positions. The target site and direct ; on of five different PCR primers are 
shown with doned lines above the nucleotide sequence. . , asterix ( # ) marks 

15 the translation stop codon. 

Figure 2 is the genomic DNA sequence of the Soybean seed coat peroxidase. 

Figure 3 is a comparison of soybean seed coat peroxidase with other closely related 
20 plant peroxidases. The Genfiank accession n Limbers are provided next to the 

name of the plant from which the peroxidase w^5 isolated. Thft accession 
number for the soybean sequence is L78163. (A) A comparison of the 
nucieic acid sequences; (B) A comparison of the airuno acid sequences. 
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Figure 4 :'s a restriction fragment length polymorphisms between EpEp and epep 
genotypes using the seed coat peroxidase cDNA as probe. Genomic DNA of 
soybean lines OX312 (epep) and OX347 (EpEp) was digested with 
restriction enzyme, separated by electrophoresis in a 0.5% agarose gel, 
transferred to nylon, and hybridized with "P-labelled cDNA encoding the 
seed coat peroxidase. The size of the hybridizing fragments was estimated 
by comparison to standards and is indicated on the right. 

Figure 5 exhibits the structure of the Ep Locus. A 17 kb fragment including the Ep 
locus is illustrated schematically. A 3.3 kb portion of the gene is enlarged 
and exons and introns are represented by shaded and open boxes, 
respectively. The final enlargement of the 5' region shows the location and 
DNA sequence around the 87 bp deletion occurring in the ep allele of 
soybean line OX312. Nucleotides are numbered by assigning +1 to the first 
15 base of the ATG start codon. 



Figure 6 displays PCR analysis of EpEp and epep genotypes using primers derived 
from the seed coat peroxidase cDNA. Genomic DNA from soybean lines 
OX312 (epep) and OX347 (EpEp) was used as template for PCR analysis 
with four different primer sets. Amplification products were separated by 
electrophoresis through a 0.8% agarose gel and visualized under UV light 
after staining with ethidium bromide. Genotype and primer combinations are 
mdicated at the top of the figure. The size ,n base patrs of the amplified 
DNA fragments are indicated on the right. 



20 
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Figure 7 exhibits PCR analysis of an F2 population from a cross of EpEp and epep 
genotypes. Genomic DNA was used as template for PGR analysis of the 
parents (P) and 30 F 2 individuals. The cross was derived from the soybean 
lines OX312 {epep) and OX347 {EpEp). Plants were self pollinated and 
seeds were collected and scored for seed coat peroxidase activity. The 
symbols (-) and (+) indicate low and high seed coat peroxidase activity, 
respectively. Primers prx<* and prxlO- were used in the amplification 
reactions. Products were separated by electrophoresis through a 0.8% agarose 
gel and visualized under UV light after staining with ethidium bromide. The 
migration of molecular markers and their corresponding size in kb is also 
shown (lanes M). 



Figure 8 displays PCR analysis of six different soybean cultivars with primers 
derived from the seed coat peroxidase cDNA sequence Genomic DNA was 
15 used as template for PCR analysts of three EpEp cultivars and three epep 

cultivars. Primers used in the amplification reactions and the size of the DNA 
product is indicated on the left. Products were separated by electrophoresis 
through a 0.8% agarose gel and visualized under UV light after staining with 
ethidium bromide. 

20 (A) Forward and reverse primers are downstream from deletion 

(B) Forward primer anneals to site within deletion 

(C) Primers span deletion 
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DESCRIPTION OF PREFERRED EMBODIMENT 

The present invention is directed to a novel oligonucleotide sequence 
encoding a seed coat pe r oxidase and associated DNA regulatory regions. 

According to the present invention DNA sequences that are "substantially 
homologous" includes sequences that are identified under conditions of high 
stringency- "High stringency" refers to Southern hybridization conditions employing 
washes at 65°C with 0.1 x SSC, 0.5 % SDS. 

By "DNA regulatory region" it is meant any region within a genomic 
sequence that has the property of controlling the expression of a DNA sequence that 
is operably linked with the regulatory region. Such regulatory regions may include 
promoter or enhancer regions, and other regulatory elements recognized by one of 
skill in the art. A segment of the DNA regulatory regio , : s exen 'ified in this 
invention, however, as is understood by one of skill in the art, this region may be 
used as a probe to identify surrounding regions involved in the regulation of 
adjacent DNA, and such surrounding regions are ako included within the scope of 
this invention. 

In the context of this disclosure, the term "promoter" or "promoter region" 
refers to a sequence of DN V, usually upstream (5') to the coding sequence of a 
structural gene, which controls the expression of the coding region by providing the 
recognition for RNA polymerase and/or other factors required for transcription to 
start at the correct site. 
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There are generally two types of promoters, inducible and constitutive. An 
"inducible promoter" is a promoter that is capable of directly or indirectly activating 
transcription of one or more DNA sequences or genes in response to an inducer. 
In the absence of an inducer the DNA sequences or genes will not be transcribed. 
5 Typically the protein factor, that binds specifically to an inducible promoter to 
activate transcription, is present in an inactive form v hich is then directly or 
indirectly converted to the active form by the inducer. The inducer can be a 
chemical agent such as a protein, metabolite, growth regulator, herbicide or phenolic 
compound or a physiological stress imposed directly by heat, cold, salt, or toxic 
10 elements or indirectly through the action of a pathogen or Hiseasc agent such as a 
virus A plant cell containing an inducible promoter may be exposed to an inducer 
by externally applying the inducer to the cell or plant such as by spraying, watering, 
heating or similar methods. 



15 By "constitutive promoter" it is meant a promoter that directs the expression 

of a gene throughout the various parts of a plant and continuously throughout plant 
development. Examples of known constitutive promoters include those associated 
with the CaMV 35S transcript and Agrobacterium Ti plasmid nopaline synthase 
gene. 

20 

The chimeric gene constructs of the present invention can further comprise 
a 3' untranslated region. A 2' untranslated regioD refers :o that portion of a gene 
comprising a DNA segment that contains a polyadenylation signal and anv other 
regulatory signals capable of effecting mRNA processing or gene expression. The 
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polyadenylation signal is usually characterized by effecting the addition of 
polyadenylic acid tracks to the 3' end of the mRNA precursor. Polyadeny!ation 
signals are commonly recognized by the presence of homology to the canonical form 
5' AATAAA-3' although variations are not uncommon. 

Examples of suitable 3' regions are the 3' trai; cribed non-a-anslated regions 
containing a polyadenylation signal of Agrobactehum tumour inducing (Ti) plasmid 
genes, such as the nopaJine synthase {Nos gene) and plant genes such as the soybean 
storage protein genes and the small subunit of the ribulose-J, 5-bisphosphate 
carboxylase (ssRUBISCO) gene. The 3' untranslated region from the structural 
gene of the present construct can therefore be used to construct chimeric genes for 
expression in plants. 



The chimeric gene construct of the present invention can also include further 
enhancers, either translation or transcription enhancers, as may be required. These 
enhancer reg.ons are well known to persons skilled in the art, and can include the 
ATG initiation codon and adjacent sequences. The initiation codon must be un phase 
with the reading frame of the coding sequence to ensure translation of the entire 
sequence. The translation control signals and initiation codons can be from a variety 
of origins, both natural and synthetic. Translation* initiation regions may be 
provided from the source of the transcriptional Nation region, or from the 
structural gene. The sequence can also be derived from the promoter selected to 
express the gene, and can be specifically modified so as to increase translation of 



the mRNA. 
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To aid in identification of transformed plant cells, the constructs of this 
invention may be further manipulated to include plant selectable markers. Useful 
selectable markers include enzymes which provide for resistance to an antibiotic 
such as gentamycin, hygromycin, kanamycin, and the like. Similarly, enzymes 
5 providing for production of a compound identifiable by colour change such as GUS 
(p-glucuronidase), or luminescence, such as luciferase are useful. 

Also considered part of this invention are transgenic plants containing the 
chimeric gene construct of the present invention. Methods of regenerating whole 

10 plants from plant cells are known in the art, and the method of obtaining 
transformed and regenerated plants is not critical to this invention. In general, 
transformed plant cells are cultured in an appropriate medium, which may contain 
sf lective agents such as antibiotics, where selectable markers are used to facilitate 
identification of transformed plant cells. Once callus forms, shoot formation can be 

15 encouraged by employing the appropriat* 1 ! plant hormones in accordance with known 
methods and the shoots transferred to rooting medium for regeneration of plants. 
The plants may then be used to establish repetitive generations, either from seeds 
or using vegetative propagation techniques. 

20 The constructs of the present invention can be introduced into plant cells 

using Ti plasmids, Ri plasmids, plant virus vectors, direct DNA transformation, 
micro-injection, electroporation, etc. For reviews of such techniques see for 
example Weissbach and Weissbach (1988) and Geierson and Corey (1988). The 
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present invention further includes a suitable vector comprising the chimeric gene 
construct. 

Buttery and Buzzell (1968) showed that the amount of peroxidase activity 
present in seed coats may vary substantially among different cultivars. The presence 
of a single dominant gene Ep causes a high seed coat peroxidase phenotype (Bur^ell 
and Buttery, 1969). Homozygous recessive epep plants are -100-fold lower in seed 
coat peroxidase activity. This results from a reduction in the amount of peroxidase 
T-ryme present, primarily in the hourglass cells of the subepidermis (Gijzen et al., 
1993). In plants carrying the Ep gene, peroxidase is heavily concentrated in the 
hourglass cells (osteosclereids). These cells form a highly differentiated cell layer 
with thick, elongated secondary walls and large intercellular spaces (Baker et al., 
1987). 



Screening a seed coat cDNA library prepared from EpEp plants with a 
degenerate primer derived from the active site domain of plant peroxidase resulted 
in a high frequency of positive clones. Many of these clones encode identical 
cDNA molecules and indicate that the corresponding mRNA is an abundant 
transcript in developing seed coat tissues. The sequence of the cDNA is shown in 
Figure !. 



Previous studies on soybean seed coat peroxidase indicated that this enzyme 
is heavily glycosylated and that carbohydrate contributes 18% of the mass of the 
apo-enzyme (Gray et al., 1996). The seven potential glycosylate sites identified 
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from the amino add sequence of the seed cost peroxidase (Figure 1) would 
accommodate the fixe or six N-linked glycosylate sites proposed b> Gray et al 
, 1996). The heme -binding domain encompasses residues Asp 161 to Phel71 and the 
^d-base catalysis region from Gly33 to Cys44. The two regions are highly 
5 conserved among plant ceroxidases and are centred around functional hisudine 
residues, Hisl69 and His40. There are eight conserved cysteine residues in the 
mature protein iha: provide for four di-sulfide bridges found in other plant 
peroxidases and predicted from the crystal structure of peanut peroxidase t Welinder, 
l c >92; Schuller et at , 1996). Other conserv areas include residues Cys9l to 

10 Ala 105 and VaJl 19 to Leu 127 that occur in or around helix D The most divergent 
aspects of the seed coat peroxidase protein sequence are the carboxy- and amino- 
terminal regions These sequences probably provide special targeting signals tor the 
proper processing and delivery of the peptide chain. It is possible the carboxy- 
termmal extension of the seed coat peroxidase is removed at maturity, a^ has been 

15 shown -for certain barlev and horseradish peroxidases i Welinder. 1992) 

Tne molecular mass of the enzyme has been determined by denaturing _:el 
electrophoresis to be 37 kDa (Sessa and Anderson. 1981, Gillikin and Graham, 
1991 ; or 4? kDa (Gijzen et al , 1993). Analysis ^y mass spectrometry indicated a 
20 mass of 40,622 Da for the apo-enzyme and 33,250 Da after deglycosylation (Gra> 
ei al , 1996) Tl^ese values are in good agreement with the mass of 35.377 Da 
calculated from the predicted amino -acid sequence for the mature apo-protein pnor 
[;> glycosylation and other modifications. Huangpu et al t 1^95) reported an anionic 
-eed cimi peroxidase having a M. of 30,577 Da and characterized a pania; cDNa 
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encoding this protein. This 1031 bp cDNA contained an open reading frame of 849 
bp encoding a 283 amino acid protein. There are several differences between this 
reported sequence and the sequence of this invention that are manifest at the amino 
acid level (see Figure 3 for sequence comparison). The enzyme encoded by the 
5 gene reported by Huangpu et al is different from that of this invention as the 
peroxidase of this invention has a M, of 35,377 Da. 

Genomic DNA blots probed with the seed coat peroxidase cDNA produced 
two or three hybridizing fragments of varying intensity with most restriction enzyme 
10 digestions, despite that several peroxidase isozymes are present in soybean. The 
results indicate that this seed coat peroxidase is present as a single gene that does 
not share sufficient homology with most other peroxidase genes to anneaJ under 
conditions of high stringency. 



The genomic DNA sequence (Figure 2) comprises four exons spanning bp 
191-411 (exon I), 1042 -1233 (exon 2), 2264-2429 (exon 3) and 2692-3174 (exon 
4) and three introns comprising 412-1041 (intron 1), 1234-2263 (intron 2) and 2430- 
2691 (intron 3). Features of the upstream regulatory region of the genomic DNA 
include a TATA box centred on bp 147; a cap signal 32 bp down stream centred on 
bp 179. AJso noted within the genomic sequence are three polyadenylation signaJs 
centred on bp 3180, 3258, 3323 and a polyadenylation site at bp 3359. 

This promoter is considered seed coat specific since the peroxidase protein 
encoded by the Ep gene accumulates in the seed coat tissues, especially in the 
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hourglass cells of the s Epidermis, and :s not expressed in other tissues, aside from 
a marginal expression of peroxidase in the root tissues. The DNA regulator 
regions of the genua:!-: .equence of Figure 2 are used to control the expression of 
the adjacent peroxidase gene in seed coat lissue. Such regulatory regions include 
nucleotides 1-191 Other regions of interest include nucleotides 412-1041. 1234- 
2263 andor 2430-2691 of SEO ID NO:2. Therefore other proteins of interest may 
be expressed in seed coat tissues by placing a gene capable of expressing the protein 
of interest under the control of the DNA regulatory elements of this invention 
uenes of interest include but are not restricted to herbicide resistant genes, genes 
encoding viral coat pro'ems, or genes encoding proteins conferring biological control 
of pest or pathogens such as an insecticidal protein for example B thurmgiensis 
toxin. Other genes mJu'Je those capable of the production of proteins that alter the 
taste of the seed and/or -hat affect the nutritive value of the sovhean 



15 A modified DNA regulatory sequence may be obtained bv introducing 

changes into the natural sequence. Such modifications cm be done through 
techniques known to one of skill in the art such as site-directed mutagenesis, 
reducing the length of the regulatory region using endonuclea^e^ or exonuclexses. 
increasing the length through the insertion of linkers or other sequences of interest 

20 Reducing the size of DNA regulatory region may be achie\ed b\ removing 3' or 5* 
regions of the regulatory region of the natural sequence by using a endonuclease 
^uch as BAL 3 1 (Sambrook et al 1989). Hu^e^er. an> such DN'A rcgulaior\ region 
rrius: still function a- a seed coat specific DNA regular. reg-.-.v. 
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It may be readily determined if such modified DNA regulatory elements are 
capable of acting in a seed coat specific manner transforming plant cells with such 
regulatory elements controlling the expression of a suitable marker gene, culturing 
these plants and determining the expression of the marker gene within the seed coat 
5 as outlined above. One may also analyze the efficacy of DNA regulatory elements 
by introducing constructs comprising a DNA regulatory element of interest operably 
linked with an appropriate marker into seed coat tissues by using particle 
bombardment directed to seed coat tissue and determining the degree of expression 
of the regulatory region (reference). 

10 

Two tandemly arranged genes encoding anionic peroxidase expressed in 
stems of Populus kitakamiensis, prxA3a and prxA4a have been cloned and 
characteiized (Osakabe et al, 1995). Both of these genomic sequences contained 
four trxons and three introns and encoded proteins of 347 and 343 amino acids, 

15 respectively. The tv/o genes encode distinct isozymes with deduced of 33.9 and 
34.6 kDa. F urthermore, a 532 bp promoter derived from the peroxidase gene of 
Armoracia rusticana has also been reported (Toyobo KK, TP 4,126,088, April 27, 
1992). However, a search using GenDank revealed no substantial similarity between 
the promoter region, or introns 1, 2 and 3 of this invention and those within the 

20 literature. 

Digestion of the genomic DNA with BamHl or Sacl revealed restriction 
fragment length polymorphisms that distinguished EpEp and epep genotypes. 
Although the Xbal digestion did not produce a readily detectable polymorphism, the 
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of the hybridizing fragment in both genotypes was -14 kb. Thus, a 0.3 kb size 
difference is outside of the resolving power of the separation for fragments this 
large. Sequence analysis of EpEp and epep genotypes indicates that the mutant ep 
allele is missing 87 bp of sequence at the 5' end of the structural gene. This would 
account for the drastically reduced amounts of peroxidase enzyme present in seed 
coats oiepep plants since the deletion includes the translation start codon and the 
entire N-terminal signal sequence. However, the 87 bp deletion cannot account for 
the differences observed in the RFLP analysis since the missing fragn ent does not 
include a BamHl site and is much smaller than the 0.3 kb polymorphism detected 
in the Sad digestion. Thus, other genetic rearrangements must occur in the vicinity 
of the ep locus that lead to these polymorphisms. 



The results shown here indicate that the mutation causing low seed coat 
peroxidase activity occurs in the structural gene encoding the enzyme. This mutation 
15 is an 87 bp deletion in the 5' region of the gene encompassing the translation start 
site. Several different low peroxidase cultivars share a similar mutation in the same 
area, suggesting that the recessive ep alleles have a common origin or that the region 
is prone to spontaneous deletions or rearrangements. 



20 Due to the industrial interest in soybean seed coat peroxidase, alternate 

sources for the production of this enzyme are needed. The DNA of this invention, 
encoding the seed coat soybean peroxidase under the control of a suitable promoter 
and expressed within a host of interest, can be used for the preparation of 
recombinant soybean seed coat peroxidase enzyme. 

BNSOOCID:<CA 2188833A1> 
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Soybean seed coat peroxidase has been characterized as a iigin-rype 
peroxidase that has industrially significant properties ie high act.vity and stability 
under acidic conditions: exhibits wide substrate specificity: equivalent caialvtic 
properties to that of Phanerochaete chrysosporium ligin peroxidase (the currenilv 
preterred enzyme used for treatment of industrial waste waters (Wick 1995) but is 
at least 150-fold more stable: more stable than horseradish peroxidase which is also 
used in industrial effluent treatments and medical diagnostic kits (McEldcon et al. 
!9s>>i. These properties are useful within industrial applications for the degradation 
• ■I' natural aromatic polymers including lignin and coal (McEIdoon et al. ! r 'Q5). and 
the preferred use of soybean peroxidase, over that of horseradish peroxidase, in 
medical diagnostic tests as an enzyme label for antigens, antibodies, oligonucleotide 
probes, and within staining techniques (Wick 1995). Soybean peroxidase is a!si. 
■a.-eJ m the deinking of printed waste paper (Johnson et al.. I S 5.2^0.770: 
.December 6. 1994) and for the biocatalytic oxidation of primary alcohols .Johnson 
cl Ji '• ' S 5J9, - 488: February 13. 1996) Soybean peroxidase has also been used 
a.-, a replacement for chlorine in the pulp and paper industry, in order to remove 
cH.-nne. phenolic or aromatic amine containing pollutants trorn industr.al ^a.ste 
waters « W.ck 1995,. or as formaldehyde replacement (Freiberg. 19QS; tor usc m 
.dhes.ves. abrasives, and protective coatings (e.g. .armsh and resins. W :c k |oq 5) . 

Furthermore, the seed coat peroxidase gene may be expressed in an organ or 
,sue specific manner within a plant. For example, the quality and strength of 
-non fibber can be .mproved through the over-expres „„ n ofcot:,n or horseradish 
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peroxidase placed under the control of a fibre-specific promoter (Maliyakal, WO 
95/08914; April 6, 1995). 

Similarly, seed-specific DNA regulatory regions of this invention may be 
5 used to control expression of genes of interest such as: 

i) genes encoding herbicide resistance, or 

ii) biological control of insects or pathogens (e,g, B. thuringiensis), or 

iii) viral coat proteins to protect against viral infections, or 

iv) proteins of commercial interest (e.g. pharmaceutical), and 

10 v) proteins that alter the nutritive value, taste, or processing of seeds 

within the seed coat of plants. 

While this invention is described in detail with particular reference to 
preferred embodiments thereof, said embodiments are offered to illustrate but not 
IS to limit the invention. 

EXAMPLES 

Plant material 

20 

All soybean {Glycine max [L.] Merr) cultivars and breeding lines were from 
the collection at Agriculture Canada, Harrow, Ontario. 
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Seed Coat cDNA library Construction and Screening 



High seed coat peroxidase (EpEp) soybean cultivar Harosoy 63 plants were 
grown in field plots outdoors. Pods were harvested 35 days after flowering and 
seeds in the mid-to-late developmental stage were excised. The average fresh mass 
was 250 mg per seed. Seed coats were dissected and immediately frozen in liquid 
nitrogen. The frozen tissue was lyophilized and total RNA extracted in 100 mM 
Tris-HCl pH 9.0, 20 mM EDTA, 4% (w/v) sarkosyl. 200 mM NaCl, and 16 mM 
DTT, and precipitated with LiCl using the standard phenol/chloroform method 
described by Wang and Vodkin (1994). The poly(A)* RNA was purified on 
oligo(dT) cellulose columns prior to cDNA synthesis, size selection, ligation into the 
A. ZAP Express vector, and packaging according to instructions (Stratagene). A 
degenerate oligonucleotide with the 5* to 3' sequence of 
TT(Cn-)CA(CAOGA(CAr)TG(CniTT(C/DGT was 5" end labelled to high specific 
activity and used as a probe to isolate peroxidase cDNA clones (Sambrook et ai, 
1989). Duplicate plaque lifts were made to nylon filters (Amersham), LTV fixed, and 
prehybridized at 36 °C for 3 h in 6 x SSC, 20 mM NajHPO, (pH6.8), 5 x 
Denhardt's, 0.4 % SDS, and 500 jig/mL salmon sperm DNA. Hybridization was in 
the same buffer, without Denhardt's, at 36 °C for 16 h. Filters were washed quickly 
with several changes of 6 x SSC and 0.1 % SDS. first at room temperature and 
finally at 40°C, prior to autoradiography for 16 h at -70°C with an intensifying 



screen. 
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iistwmu /VV-! 'Solution Lihrjry Construction. DSA ; } Aoi Anaixst.^ 

So\bean genomic DNA was isolated from lea\es of greenhouse groun plants 
or from etiolated seedlings grown in vermiculite. Plant tissue was frozen m liquid 
5 nitroeen and lyophilized before extraction and purification of DNA acceding to the 
method of Dellaporta ct ai (i983). Restnction enzyme iigestion of 30 ug DNA. 
>epara;;on on 0.5 0 o agarose gels and blotting to nylon membranes followed standard 
protocols (Sambrook et al.. 1989). For construction ot the genom;: hbrar>. DNA 
punned from Harosov 03 leaf tissue vsas partially digested with Bom\\\ and heated 
ID :r.;o me >. FIX II -vector ( Stratagene). Gicapack XL packaging extract i Stratagene ) 
w.;ts used to select for inserts of 9 to 22 kb. After iibror\ amplification, duplicate 
plague lifts v ere hybridized to cDNA probe. 

Blots or ti Iter lifts w<;re preh> bndized for 2 h at ^ ;: C :n \ Sm_\ 5 x 
15 Dermardf s. 0 5 % SDS. and 100 ng/mL salmon -sperm DNA Radiolabeled cDNA 
probe (2U io 50 ng > was prepared usmg the Ready-to-Cio labelling kit i Pharmacia > 
arid 'P-dCTP 4 Arnersham;. Unincorporated '"P-dCTP wa>> removed b\ spin column 
chromatography before adding radiolabelied cDNA to :hc bridiz-ation buffer 
! identical to preh> bndization buffer without Denhardt's) H>bndization was for 20 
h at 65°C Membranes were washed twice for 15 mm ut :.•<■;!; temperature with 2 
x SSC. 0.5 °o SDS. followed by two 30 mm washes a; o5 : C with 0 1 v _>SC\ 0 5 % 
>.DS Autoradiography was for 20 h at -"o-C using on mg .screen and X- 

^M.Vf tslm i Kodak ). 
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DNA Sequencing 

Sequencing of DNA was performed using dye-labelled terminators and Taq- 
FS DNA polymerase (Perkin-Elmer). The PCR protocol consisted of 25 cycles of 
a 30 sec melt at 96°C, 15 sec annealing at 50°C, and 4 min extension at 60°C. 
Samples were analyzed on an Applied Biosy stems 3 73 A Stretch automated DNA 
sequencer. 

Polymerase Chain Reaction 

PCR amplifications contained 1 ng template DNA, 5 pmol each primer, 1.5 
mM MgCl 2 , 0.15 mM deoxynucleotide triphosphates mix, 10 mM Tris-HCl, 50 mM 
KC1, pH 8.3, and 1 unit of Taq polymerase (Gibco BRL) in a total volume of 25 
mL. Reactions were performed in a Perkin-Elmer 480 thermal cycler. After an initial 
2 min denaturation at 94°C, there were 35 cycles of 1 min denaturation at 94°C, 1 
min annealing at 52°C, and 2 min extension at 72°C. A final 7 min extension at 
72" Z comj .tied the program. The following primes were used for PCR analysis of 



nr x2 i- 



CTTCC AAATATCAACTCAA 1 



prx6- 



TAAAGTTGGAAAAGAAAGTA 



prx9+ 



ATGCATGCAGGTTTTTCAGT 



prxlO- 



TTGCTCGCTTTCTATTGTAT 



prx!2+ 



TCTTCGATGCTTCTTTCACC 



prx29+ 



CATAAACAATACGTACGTGAT 
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Seed Coat Peroxidase Assays 

The F 3 seed was measured for peroxidase activity to score the phenorype of 
the F 2 population because the seed testa is derived from maternal tissue. The seeds 
5 were briefly soaked in water and the seed coat was dissected from the embryo and 
placed in a vial. Ten drops (^500 pL) of 0.5% guaiacol was added and the sample 
was left to stand for 10 min before adding one drop (-50 /iL) of 0.1% H : 0 : . An 
immediate change in colour of the solution, from clear to red, indicates a positive 
result and lugji seed coat peroxidase activity. 

10 

Example 1; The Seed Coat Peroxidase cDNA and genomic DNA ieqvences 

To isolate the seed coat peroxidase transcript, a cDNA library was 
constructed from developing seed coat tissue of the EpEp cultivar Harosoy 63. TTie 

15 primary library contained 10 6 recombinant plaque forming units and was amplified 
prior to screening. A degenerate 17-mer oligonucleotide corresponding to the 
conserved active site domain of plant peroxidases was used to probe the library. In 
screening 10,000 plaque forming units, 12 positive clones were identified. The 
cDNA insert size of the clones ranged from 0.5 to 2.5 kb, but six clones shared a 

20 common insert size of 1.3 kb. These six clones (soyprx03 t soyprxOS, soyprx06 t 
soyprxl I. soyprxl2, and sc,yprx!4) were chosen for furtr-r characterization since the 
1.3 kb insert size matched the expected peroxidase transcript size. Sequence analysis 
of the six clones showed that they contained identical cDNA transcripts encoding 
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a peroxidase and that each resulted from an independent cloning event since the 
junction between the cloning vector and the transcript was different in all cases. 

Since it was not clear that the entire 5' end of the cDNA transcript was 
complete in any of the cDNA clones isolated, the structural gene corresponding to 
the seed coat peroxidase was isolated from a Harosoy 63 genomic library. A partial 
BamlU digest of genomic DNA was used to construct the library and more than 10 6 
plaque forming units were screened using the cDNA probe. A positive clone, G25-2- 
1-1-1, containing a 17 kb insert was identified and a 3.3 kb region encoding the 
peroxidase was sequenced (Figure 2). 

The genomic sequence matched the cDNA sequence except for three introns 
encoded within the gene. The genomic sequence also revealed two additional 
translation start codons. beginning one bp and 1C bp upstream from the 5' end of 
the longest cDNA transcript isolated. Figure 1 shows the deduced cDNA sequence. 
The open reading frame of 1056 bp encodes a 352 amino acid protein of 38,106 Da. 
A heme-binding domain, a peroxidase active site signature sequence, and seven 
potential N-glycosylation sites w.;re identified from the deduced amino acid 
sequence. The first 26 amino acid residues conform to a membrane spanning 
domain. Cleavage of this putative signal sequence releases a mature protein of 326 
residues with a mass of 35,377 Da and an estimated pi of 4.4. 



Relevant features of the genomic fragment (Figure 2) include four exons at 
bp 192*411 (exon 1), 1042 -1233 (exon 2), 2263-2429 (exon 3) and 2692-3174 
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(exon 4) and three introns at bp 412-1041 (intron 1). 1234-2263 (intron 2) and 2430- 
2691 f intron 3). The 191 bp regulatory region of the genomic DNA include a 
TATA box centred on bp 147 and a cap signal 32 bp down stream centred at bp 
179. Also noted within the genomic sequence are three polyadenylation signals 
5 centred on bp 3180, 3258, 3323 and a polyadenylation site at bp 3359. 

Figure 3 illustrates the relationship between the soybean seed coat peroxidase 
and other selected plant peroxidases. The soybean sequence is most closely related 
to fcur peroxidase cDNAs isolated from alfalfa, (see Figure 3) sharing from 65 to 

10 67% identity at the amino acid level with the alfalfa proteins (X90693, X90694, 
X90692, el-Turk et al 1996; L36156, Abrahams et al 1994). When compared with 
other plant peroxidases, soybean seed coat peroxidase exhibits from 60 to 65% 
identity with poplar (D30653 and D30652, Osakabe et al 1994)) and flax (LOS 54, 
Omann and Tyson 1995); 50 to 60% identity with horseradish (M37156, Fujiyama 

15 et al. 1988), tobacco (D 11 396, Osakabe et al 1993), and cucumber (M91373, 
Rasmussen et al. 1992); and 49% identity with barley (L36093, Scott-Craig et al. 
1994), wheat (X85228, Baga et al 1995) and tobacco (L02124, Diaz-De-Leon et al 
1993) peroxidases. 

20 Example 2: DNA Blot Analysis Using the Seed Coat Peroxidase cDNA 

Probe Reveals Restriction Fragment Length Polymorphisms Between EpEp and cpep 
Genotypes 



BN8DOCtD:<CA 2186833A1> 



2186833 

- 29 - 

Genomic DNA blots of OX347 (EpEpt and OX312 (epep) plants were 
hybridized with 3: P-labelled cDNA to estimate the copy number of the seed coat 
peroxidase gene and to determine if this locus is polymorphic between the two 
genotypes. Figure 4 shows the hybridization patterns after digestion with BamHl, 
XbaU and Sacl. Restriction fragment length polymorphisms are clearly visible in the 
BamHl and Sacl digestions. The BamHl digestion produced a strongly hybridizing 
17 kb fragment and a faint 3.4 kb fragment in the EpEp genotype. The 3.4 kb 
BamHl fragment is visible in the epep genotype but the 17 kb fragment has been 
replaced by a signal at >20 kb. The Sacl digestion resulted m detection of three 
fragments in EpEp and epep plants. At least two fragments were expected here since 
the cDNA sequence has a Sacl site within the open reading frame. However, the 
smallest and most strongly hybridizing of these fragments is 5.2 kb in EpEp plants 
and 4.9 kb in epep plants. Digestion with Xbal produced hybridizing fragments of 
-14 kb and 7.8 kb for both genotypes, with the larger fragment showing a stronger 
signal. 

Example 3: A Deletion Mutation Occurs in the Recessive Locus 

The structural gene encoding the seed coat peroxidase is schematically 
illustrated in Figure 5. The 17 kb BamHl fragment encompassing the gene includes 
191 bp of sequence upstream .torn the translation start codon, three introns of 631 
bp, 1030 bp, and 263 bp, and 13 kb of sequence downstream from the 
polyadenylation site. The arrangement of four exons and three introns and the 
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i^:rrxn\ f mi;or_ 'v;thm me sequence is similar to that der.cnhed for othc r! 
roxidase> i S-::i*»;:. :^ g 2. Osakabe a/ !9 f >5» 



Primers were designed from the DNA sequence to compare EpEp and t/.'fn 
5* eenotypes b> PCR analysis. Figure 6 shows PCR ampliik'.^un products from four 
different primer omhinations using OX3i2 iepep) and i)X}4~ \EpEp\ genomic 
DNA as tc-::.v-i:e. The primer annealing si'e for prx29- begin:. ! 82 bp upstream 
from the A 1 G stan codon: the remaining primer sues are shown in Figure : 
Amplification wuh pr-mers prxJ- and prx6-. and with prx i 2 - ana prx 10- produced 

10 the expected rroduets of 1 .9 kh and 860 op. respectively. regardless of the Ep cp 
genotype of r.he template DNA. However. PCR amplification w.th primers prx9~ 
.::vd prx!0-. and --ith prx2^- and prx 10- generated the expected products only when 
template DN a from plants carrying :rv dominant Ep- allele '^'hen template 

DNA vus fr.im ^ <-/ 5 '"r genotype, no product was detected us::-g primers prx^~ and 

15 prx 10- and a smaller product was amplified v ith primers prx2°~ and prx 10-. The 
products resulting from amplification of OX312 or OX} A" template DNA with 
primers prx29~ and prx 10- were directly sequenced and compared. ~i~he 
polymorphism :s due to an 87 bp deletion occurring within this DNA fragment .n 
OX312 plants, as shown in Figure 5. This deletion begins mne bp upstream from 

20 the translation start codon and includes 78 bp of sequence at ihe 5* end of the opo: : 
reading frame, including the prx9- primer annealing site. 

To test w':i;tr:er this deletion mutation cose., .r-: ga'e> v-;t: 'he seed coai 
per- xida>e phe::-» t> re. genomic DNA from an F : populate :; . :;nc a: :>j t:r 
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locus was amplified using primers prx9^ and prxlO- and Fj seed was tested for seed 
coat peroxidase activity. Figure 7 shows the results from this analysis. Of the 30 F : 
individuals tested, all 23 that were high in seed coat peroxidase activity produced 
the expected 860 bp PCR amplification product. The remaining seven F 2 s with low 
seed coat peroxidase activity produced no detectable PCR amplification products. 

Finally, to determine if the OX3 1 2(epep) and OX347(EpEp) breeding lines 
are representative of soybean cultivars that differ in seed coat peroxidase activity, 
several cu!;*vars were tested by PCR analysis using primer combinations targeted 
to the Ep locus. Figure 8 shows results from this analysis of six different soybean 
cult ivars, three each of the homozygous dominant EpEp and recessive ?pep 
genotypes. As observed with OX312 and OX347, amplification products of the 
expected size we., produced with primers prxl2+ ?jid prxlO- regardless of the 
genotype, whereas epep genotypes yielded no product with primers prx9+ and 
prxlO- or a smaller fragment with primers prx29+ and prxlO-. 

Ail scientific publications and patent documents are incorporated herein by 
reference. 

The present invention has been described with regard to preferred 
embodiments. However, it will be obvious to persons skilled in the art that a 
number of variations and modifications can be made without departing from the 
scope of the invention as described in the following claims. 
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SEQUENCE LISTING 

(X) GENERAL INFORMATION: 
(i) APPLICANT: 
5 (A) NAME: Mark Gijxen 

(B) STREET : 848 Princess Avenue 

(C) CITY: London 

(D) STATE: Ontario 

(E) COUNTRY: Canada 

10 (F> POSTAL CODS (ZIP) : NSW 3M4 

Ui) TITLF OF INVENTION: Seed Coat DNA Regulatory Region and 
Peroxidase 
<iii) NUMBER OF SEQUENCES: 2 
v iv) COMPUTER READABLE FORM: 
15 (A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS- DOS 

(D) SOFTWARE: Patent In Release H1.0, Version ttl.30 (EPO) 
(2) INFORMATION FOR SEO ID NO: 1: 

20 <i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1244 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
25 <ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 
(ix) FEATURE: 

(A) NAME/ KEY: CDS 
30 (B) LOCATION: 1. .1056 

(ix) FEATURE: 

(A) NAME/ KEY : eig_peptide 

(B) LOCATION :1. .77 

(xi) SEQUENCE DESCRIPTION: 3EQ ID NO: 1: 
ATG GGT TCC ATG CGT CTA TTA GTA GTG GCA TTG TTG TGT GCA TTT GCT 4 8 

M t Gly Ser Met Arg Leu Leu Val Val Ala Leu Leu Cys Ala Phe Ala 
15 10 15 



35 
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96 



ATG CAT GCA GGT TTT TCA GTC TCT TAT GCT CAG CTT ACT CCT ACG TTC 

Met His Ala Gly Phe Ser Val Scr Tyr Ala Gin Leu Thr Pro Thr Phe 

20 25 30 

TAC AGA GAA ACA TGT CCA AAT CTG TTC CCT ATT GTG TTT GGA GTA ATC 14 4 

5 Tyr Arg Glu Thr Cys Pro Asn Leu Phe Pro He Val Phe Gly Val He 
35 40 45 

TTC GAT GCT TCT TTC ACC GAT CCC CGA ATC GGG GCC AGT CTC ATG AGG 19 2 

Phe A?' Ala Ser Phe Thr Asp Pro Arg He Gly Ala Ser Leu Met Arg 
50 55 60 
10 CTT CAT TTT CAT GAT TGC TTT GTT CAA GGT TGT GAT GGA TCA GTT 7TG 24 0 

Leu His Phe His Asp Cys Phe Val Gin Gly Cys Asp Gly Ser Val Leu 
65 70 75 80 

CTG AAC AAC ACT GAT ACA ATA GAA AGC GAG CAA GAT GCA CTT CCA AAT 2 88 

Leu Asn Asn Thr Asp Thr He Glu Ser Glu Gin Asp Ala Leu Pro Asn 
15 85 90 95 

ATC AAC TCA ATA AGA GGA TTG GAC GTT GTC AAT GAC ATC AAG ACA GCG 3 36 

He Asn Ser He Arg Gly Leu Asp Val Val Asn Asp He Lys Thr Ala 

100 135 110 

GTG GAA AAT AGT TGT CCA GAC ACA GTT TCT TGT GCT GAT ATT CTT GCT 33 4 

20 Val Glu Asn Ser Cys Pro Asp Thr Val Ser Cys Ala Asp He Leu Ma 

115 120 125 

ATT GCA GCT GAA ATA GCT TCT GTT CTG GGA GGA GGT CCA GGA TGG CCA 43 2 

He Ala Aj.a Glu He Ala Ser Val Leu Gly Gly Gly Pro Gly Trp Pro 
130 135 140 

25 GTT CCA TTA GGA AGA AGG GAC AGC TTA ACA GCA AAC CGA ACC CTT GCA 4 80 

Val Pro Leu Gly Arg Arg Asp Ser Leu Thr Al* Asn Arg Thr Leu Ala 
145 150 155 160 

AAT CAA AAC CTT CCA GCA CCT TTC TTC AAC CTC ACT CAA CTT AAA GCT 52 8 

Asn Gin Asn Leu Pro Ala Pro Phe Phe Asn Leu Thr Gin Leu Lys Ala 
30 165 170 175 

TCC TTT GCT GTT CAA GGT CTC AAC ACC CTT GAT TTA GTT ACA CTC TCA 576 

Ser Phe Ala Val Gin Gly Leu Asn Thr Leu Asp Leu Val Thr Leu Ser 

180 185 190 

GGT GGT CAT ACG TTT GGA AGA GCT COG TGC AGT ACA TTC ATA AAC CGA 6 24 

Gly Gly His Thr Phe Gly Arg Ala Arg Cys Ser Thr Phe He Asn Arg 

195 200 205 

TTA TAC AAC TTC AGC AAC ACT GGA AAC CCT GAT CCA ACT CTG AAC ACA 672 



35 
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Leu Tyr Asn Phe Ser Asn Thr Gly Asn Pro Asp Pro Thr Leu Asn Thr 

210 215 220 

ACA TAC TTA GAA GTA TTG CGT GCA AGA TGC CCC CAG AAT GCA ACT GGG 7 20 

Thr Tyr Leu Glu Val Leu Arg Ala Arg Cys Pro Gin Asn Ala Thr Gly 
5 225 230 235 240 

GAT AAC CTC ACC AAT TTG GAC CTG AGO ACA CCT GAT CAA TTT GAC AAC 76 8 

Asp Asn Leu Thr Asn Leu Asp Leu Ser Thr Pro Asp Gin Phe Asp Asn 

245 250 255 

AGA TAC TAC TCC AAT CTT CTG CAG CTC AAT GGC TTA CTT CAG AGT GAC 816 
10 Arg Tyr Tyr Ser Asn Leu Leu Gin Leu Asn Gly Leu Leu Gin Ser Asp 
260 265 270 

CAA GAA CTT TTC TCC ACT CCT GGT GCT GAT ACC ATT CCC ATT GTC AAT 864 
Gin Glu Leu Phe Ser Thr Pro Gly Ala Asp Thr lie Pro lie Val Asn 
275 280 285 

15 AGC TTC AGC AGT AAC CAG AAT ACT TTC TTT TCC AAC TTT AGA GTT TCA 912 
Ser Phe Ser Ser Asn Gin Asn Thr Phe Pne Ser Asn Phe Arg Val Ser 

290 295 300 

ATG ATA AAA ATG GCT AAT ATT GGA GTG CTG ACT GGG GAT GAA GGA GAA 96 0 

Met lie Lys "Met Gly Asn He Gly Val Leu Tr.r Gly Asp Glu Gly Glu 
20 305 310 315 320 

ATT CGC TTG CAA TGT AAT TTT GTG AAT GGA GAC TCG TTT GGA TTA GC 10 0*8 

He Arg Leu Gin Cys Asn Phe Val Asn Gly Asp Ser Phe Gly Leu Ala 

325 330 335 

AGT GTG GCG TCC AAA GAT GCT AAA CAA AAG CTT GTT GCT CAA TCT AAA 1056 
25 Ser Val Ala Ser Lys Asp Ala Lye Gin Lys Leu Val Ala Gin Ser Lys 
340 34S 350 

TAAACCAATA ATTAATGGGG ATGTGCATGC TAG CT AG CAT GTAAAGGCAA ATT AGG TTG T 1116 
AAACCTCTTT GCTAGCTATA TTGAAATAAA CCAAAGCAGT AGTGTGCATG TCAATTCGAT 1176 
TTTGCCATGT ACCTCTTGGA ATA TT ATG T A ATAATTATTT GAATCTCTTT AAGGTACTTA 12 3 6 

30 ATTAATCA 1244 



(2) INFORMATION FOR SEQ ID NO- 2: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 3359 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA : genomic) 
till) H YPOTHET I CAL : NO 
(iv) ANTI -SENSE: NO 
(ix) FEATURE: 
5 (A) NAME /KEY : exon 

(3) LOCATION: 192 . . 411 
(ix) FEATURE: 

■A) NAME/ KEY : exon 
(B) LOCATION: 104 2 . .1233 
10 (ixi FEATURE: 

(A) NAME /KEY : exon 

(B) LOCATION: 2264 . .2429 
Ux> FEATURE: 

(A) NAME /KEY : exon 
15 (Bi LOCATION: 26 92 . .3174 

(ixi FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION:412 . . 1042 
(ixi FEATURE : 

20 (A) NAME /KEY : intron 

(BJ LOCATION: 1234 . . 2263 
(ix) FEATURE: 

(A) NAME / KEY : intron 

(B) LOCATION : 243 0 . .26?^ 
(ix) FEATURE: 

(A) N* M™ ' KEY : CDS 

(B) LOCATION: 192 . .4X1 
(ix) FEATURE : 

(A) NAME /KEY : CDS 

(B) LOCATION: 1042 . . 123 3 
(ixi FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 2264 . .2429 
(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 26 92 . . 3 174 
(xij SEQUENCE DESCRIPTION: SEQ ID NO 



25 



30 
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Z CAT CAT AT C ATAAACAATA CGTACCTGAT ATTATCTA3T GTCTCTCAGT TTACTTTATG 6C 
A3AAATTATT TTTCTTTAAA AAAAGTTAAT TAATAAAAAC ATTTGCGATA CC3TGAGTTA 12C 
CAAG AAATC Z GCCGAATTCA TCTCTATAAA T AAAAGG A T C TATATGAGAG G T AAAAT CAT 190 
ATTAACTCAA A ATG GGT TCC ATG CGT CTA TTA GTA GTG GCA TTG TTG TGT 2 ~> Z 

5 Met Gly Ser Met Arg Leu Leu Val Vai Ala Leu Leu Cys 

1 5 10 

Z CA TTT GCT ATG CAT GCA GOT TTT TCA GTC TCT TAT GCT CAG CTT ACT 278 
Ala Pne Ala Met His Ala Gly Phe Ser Val Ser Tyr Ala Gin Leu Thr 
IS 20 25 

10 CTT ACG TTC TAC ACA GAA ACA TGT CCA AAT CTG TTC CCT ATT GTG TTT 3 26 

?ro Thr Phe Tyr Arg Glu Thr Cya Pro Asn Leu Phe Pro lie Val Phe 
30 35 4C 45 

GGA GTA ATC TTC GAT GCT TCT TTC ACC GAT CCC CGA ATC GGG GCC AGT 3 74 

Giy Val lie Phe Asp Ala Ser Phe Thr Aap Pro Artj lie Gly Ala Ser 
15 SC 55 60 

JTC ATG AGC CTT CAT TTT CAT GAT TGC TTT GTT CAA G TACGTACTTT 421 
Leu Met Arg Leu His rhi His Asp Cys Phe Val Gin 

^5 70 
TTTTTTTCCT TCCAAAATGC C CTG CAT ATT TAACAAGATT G CTTTGTT CA CCTAGAAAAA 481 
20 TGTGTTTTTT TCAACGATCT TACGTACGTT TGTTTGGTTT GAAAAATAAA TCAGAAAGAG 54 1 

AT CLT -AG AAAA TAGCTAGAAA Z AAAG CAACG TTTTTTTAAA AGO T ATTT AG TGTGAGAAAA 601 
ATATTAAAAC T G AAG AG AAA GAAATTAAAT AAGCTTTTCT TGAATGATAT TTACATGTCT 6 61 

T A TT AA CTT A AAGTCACCTT TTTTCTTTAA GTTGTGCTTG AAGAAAAAAG ATGTCTTTCA 721 
GTTTAGTTTT G ATT AAT GCT AA TT AT ATTT TTAATTAATT AATTAATACT ATATATCTAT 781 
25 TTA C CAT ATT AATTATTACT ATATTTCATG ATGACAACAG ACAAGTATTC T AAAG AGG T A 84 1 

TC3GTA3ATG ATTAATTTTT "ATAAAAAA ATCTT1 l'GCG TO T AT AGATA TTCTTTTATA 90 1 

ATTGGTCCAG AAACTTGTAA TGCTAATTGC AA TTT AAT CTT ACATTGATTA ACTAATAGCT 961 
ATAATCAATA TTT AGG TT AG GTATAGGAGA CAAATCAAGT GAT CTG AACA AATTAAGTTG 1021 
TTATATTTGC ATTG TGACAG GGT TGT GAT GGA TCA GTT TTG CTG AAC AAC 10 71 

Gly Cys Aap Gly Ser Val Leu Leu Asn Asn 
1 5 io 

ACT GAT ACA ATA GAA AGC GAG CAA GAT GCA CTT '.CA AAT ATC AAC TCA 1119 
Thr Asp, Thr lie Gi j Ser Glu Gin Asp Ala Leu Pro Asn lie Asn S~r 

1? 20 25 

A-A AG A GGA TTG "AC GTT GTC AAT GAC ATC AAG ACA GCG GTG GAA AAT 116 7 

lie Arg Gly Leu Asp Val Val Asn Aap lie Lys Thr Ala Val Glu Asn 

3 5 4C 



30 



35 
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10 



15 



20 



25 



35 
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ACT TGT CCA GAC ACA GTT TCT TGT GCT GAT ATT CTT GCT ATT GCA GCT 1 21S 
Ser Cys Pro Asp Thr Val Ser Cys Ala Asp lie Leu Ala lie Ala Ala 

45 SO 55 

GAA ATA GCT TCT GTT CTG GTAATTAATA ACTCCTAATT AATTCCCAAC 
Glu He Ala Ser Val Leu 
60 

CATTAAAAAG TTGCATGATT GGATTCAAAA TTCTATGGTA TTGGGGTTCT GATATAAATT 
TGTAATTAAA TTGCACTAAA AAAAATTATC ATATACTTTT AATAAAAAAA ATTTATCTAA 
TTTAATTTAT TATTAAAACT ATTTTTAAAA TTCAATCCTA ACTCTTTTTT AATCGGAGCA 
7GTAAGCTGG CACCCACCGT ATATCGTTGG AAGATGCTAT AAAACCATTT AATTAATGGA 
TGGAATCAGT CAAAACATTT AATTCAAAAT ACTCTTAATT GTGATTAGTA ATCATGTTCG 
GGCAAGTTAC GTTGTGTATA ATTAATTTGA CTTA VTCAGA TAAAAAAACA AATGGACGCA 
AGCCGGTTGG TATAGATATC ACTGGCCTGT AGAATATGTG GTTTTTCACG TTTAAATAAA 
AGCTAGCTAC TATATTATAT TTAGTCTTTT TTTTTCTTAA ACCCATTTAA CGTGATTTAT 
TGACTGTGAA ACATGTTTCC ACACACAGGC TTAGAAACTC CTCGCAACTA ACATCTCCAA 
AATTTGACTA TTTATTTATG AAGATAATTC ATCTATGATG TTCAACTCTA TTATATATAT 
GTATCA.CGC AGTATTAAGA ATTATAATAG TCAAATATAG AAGTATATCG GGTAAATGTA 
GTTGCATGTG CGACCTGTTT CGTGTAAAAT GCTTATTCTA TAT AG CTTTT TTTATTGGAA 
AA TAACGATG AACTAAAAAC GAAAGGGTAT CATATAGTTT GACTTTTATG T7AGAGAGAG 
ACATCTTAAT TTGGTCATAT GTTAAATAAT TAATTACAAT CCATACACAA ATATTTATGC 
CATATCTAAA AAATGATAAA ATATCATAGG TATACTCAAC TATATGATAT CCCCATAACA 
GAAATTGTAC TTTTCTTCAG GCAATGAACT TAACATTTCT GTTTGCTAAA AACAAACATC 
CACTTAAAGT GGTTCAACAT ATTTATCTAA TAATTTACAG GCA GGA GGT CCA GGA 

Gly Gly Gly p ro G i y 



1 

TGG 



CA GTT CCA TTA GGA AGA AGG GAC ACC TTA ACA GCA AAC CGA ACC 
Trp P„ Val Pro Leu Gly Ar g Arg Aap Ser Leu Thr Ala A«n Arg Thr 



ib 20 



1263 



1323 

1383 

1443 

1503 

1563 

1623 

1603 

1743 

ieo3 

1863 
192 3 
1983 
2043 
2103 
2163 
2223 
2 2 78 



23 26 



CTT GCA AAT CAA AAC CTT CCA GCA CCT TTC TTC AAC CTC ACT CAA CTT 
25 



30 35 



AAA GCT TCC TTT GCT GTT CAA GGT CVC AAC ACC CTT GAT TTA GTT ACA 

40 4S 

CTC TCA G GTATACATAA TCAATTTTTT ATTTGCTATT AGCTAGCAAT AAAAAGTCTC 
Leu Sc 

55 



2422 



2479 
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TCATACAGAC ATATTTAGAT AAATTAATTT CTCCATAAAC ATTTATAATA AAATTATCAA 
TTTATGTACT TAAAAATTAT GGATTGAAGC TCTTTTCATC CAACTTTTAC TAAAGTTAAC 
G7GCATATAA TATAAAATAA ACTATCTCTT GTTTCTTATA AAAAGATTCA AGATAAGTTA 
AAGTCTACTT ATAAATCA7T AATATATGTA TA GGT GGT CAT ACG TTT GGA AGA 
^ Gly Gly His Thr Phe Gly Arg 

1 5 
GCT CGG TGC ACT ACA TTC ATA AAC CGA TTA TAC AAC TTC AGC AAC ACT 
Ala Arg Cys Ser Thr Phe He Asn Arg Leu Tyr Asn Phe Ser Asn rhr 
10 is 20 

10 GGA AAC CCT GAT CCA ACT CTG AAC ACA ACA TAC TTA GAA GTA TTG CGT 
Gly Asn Pro Asp Pro Thr Leu A*n Thr Thr Tyr Leu Glu Val Leu Arg 

25 3° 3S 

GCA AGA TGC CCC CAG AAT CCA ACT GOG GAT AAC CTC ACC AAT TTG GAC 
Ala Arg Cys Pro Gin Asn Aia Thr Gly Asp Asn Leu Thr Asn Leu Asp 



15 



20 



25 



40 * S 50 55 

CTG AGC ACA CCT GAT CA, 3AC AAC AGA TAC TAC TCC AAT CTT CTC 

Leu Ser Thr Pro Asp Gin Phe Asp Asn Arg Tyr Tyr Ser Asn Leu L«u 

CAG CTC AAT GGC TTA CTT CAG ACT GAC CAA GAA CTT TTC TCC ACT CCT 
Gin Leu Asn Gly Leu Leu Gin Ser Asp Gin Glu Leu Phe Ser Thr Pro 
75 80 8s 

GGT GCT GAT ACC ATT CCC ATT GTC AAT AGC TTC AGC ACT AAC CAG AAT 
Gly Ala Asp Thr He Pro Ue V.l A . n Ser Pne Ser s „ ^ ^ ^ 
90 95 iq 

ACT TTC TTT TCC AAC TTT AGA GTT TCA ATG ATA AAA ATG 3GT AAT A- 
Thr Phe Phe Ser Phe Arg V.l s.r Mec He Ly- * eC Gl y Asn „. 



105 



1X0 



30 



125 130 

° 135 



CTG AAT GGA GAC TCG TTT GGA TTA GCT AGT CTG GCG TCC AAA CAT GCT 



140 1<5 

1,5> ISO 



155 1«0 



TGCATGCTAG CTAGCATGTA AAGGCAAATT AGGTTGTAAA CCTCTTTGCT AG CT ATA TTG 



2539 
2599 
2659 
2712 



2760 



2808 



2856 



2S04 



2952 



•3000 



3048 



GGA GTG CTG ACT GGG GAT GAA GGA GAA ATT CGC TTG CAA TOT AAT TTT 
GXy Val Leu Thr Gly Asp Glu Gly Glu He Arg Leu Gin Cy, A.n Phe 

120 



3 144 



AAA CAA AAG CTT GTT GCT CAA TCT AAA TAA ACCAATAATT AATOGGGATG 
Lys Gin Lys Leu Val Ala Gin Ser Lys • 



3254 
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AAATAAACCA AAGGAGTAGT GTGCATGTCA ATTCGATTTT GCCATGTACC TCTTGGAATA 3 314 

TTATGTAATA ATTATTTGAA TCTCTTTAAG GTACTTAATT AATCA 3 3S9 

2223 
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THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE 
PROPERTY OF PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS 



An isolated DNA molecule having the nucleotide sequence of SEQ ID NO: I. 

An isolated DNA molecule comprising a nucleotide sequence substantially 
homologous to that of SEQ ID NO:2. 

The isolated DNA molecule of claim 2 having the nucleotide sequence of 
SEQ ID NO:2. 



An isolated DNA molecule encoding a DNA regulatory element comprising 
a nucleotide sequence substantially homologous to that of 1- 191 of SEQ ID 

NO:2. 



5. The isolated DNA molecule of claim 4, wherein the DNA regulator/ element 
comprises the nucleotide sequence of 1-191 of SEQ ID NO:2. 

6. An isolated DNA molecule of claim 2 comprising the nucleotide sequence 
of 412-1041 of SEQ ID NO:2. 

7. An isolated DNA molecule of claim 2 comprising the nucleotide sequence 
of 1234-2263 of SEQ ID NO:2. 

8- An isolated DNA molecule of claim 2 comprising the nucleotide sequence 
of 2430-2691 of SEQ ID NO:2. 



9. A vector which comprises a DNA molecule selected from the group 
consisting of SEQ ID NO:l. SEQ ID NO:2 and nucleotides 1-19J of SEQ 
ID NO:2. 
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10. A vector of claim 9 wherein the DNA molecule comprises nucleotides 1-191 
of SEQ ID NO:2. 

11. A vector of claim 10 which comprise a gene of interest under the control 
of the DNA molecule. 

12. A host cell capable of expressing the DNA molecule within the vector of 
claim 9. 

13. A transgenic plant comprising the vector of claim 9. 

14. A method for the production of soybean seed coat peroxidase in a host cell 
comprising: 

i) transforming the host cell with the vector comprising an isolated DNA 
molecule selected from the group consisting of SEQ ID NO: 1, SEQ ID NO:2 
and nucleotides 1-191 of SEQ ID NO:2, and; 

ii) cultaring the host cell under conditions to allow expression of the soybean 
seed coat peroxidase. 

15. A process for producing a heterologous gene of interest within seed coat 
cells comprising propagating a transformed plant with the vector of claim 1 1. 
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FlGLH£ 1 

ATGGGTTCCATGCGTCTATT 2 0 
M G S M P. I. L 

orx9* > 

AGTAGTGGCATTGTTGTGTGCATTTGCTATGCJVTGCAGGTTTTTCAGTCTCTTATGCTCA 8 0 

VVALL CAFAMHAGFSVS YA Q 1 
signal sequence 

GCTTACTC CTACGTTCTACAGAG.\AACATGTCCAAATCTGTTCCCTATTGTGTTTGGAGT 14 0 

LTPTFYRETCPNLFPrVFGV 21 



prxl2* 



AATCTTCGATGCTTCTTTCACCGATCCCCGAATCGGGGCCAGTCTCATGAGGCTTCATTT 200 

IFDASFTDPRI GASLMRLHF 41 

active site 

i 

TCATGATTGCTTTGTTCAAG GTTGTGATGGATCAGTTTTGCTGAA CAACACTGAT AC AA T 2 60 
H D £ FVQ GCDGSVLLNN TDTI 6 ■ 



--prxlO- - prx2* 



^ pcx^* > 

AGAAAGCGAGCAAGATGCACTTCCAAATATCAACTCAATAAGAGGATTGGACGTTGTCAA 1 2 0 

K SEQDALPNINSIRGLOVVN 81 

TGACATCAAGACAGCGGTGGAAAATAGTTGTCCAGACACAGTTTCTTGTCCaGATATTCT 3 8 c 

DIKTAVEN'SCPDTVSCADIL 101 

II 

TGCTATTGCACCTGAAATAGCTTCTGTTCTG GGAGGAGGTCCAGGATGGCCAGTTCCATT 44 0 

A1AAEIASVL GGGPGWPVPL 121 

AGGAAGAAG<X;ACAGC^AACAGCAuAACCGAACCCTTGCAAATCAA^ 500 

GRRDSLTAMRTLANQMLPA? 141 

TTTCTTCAACCTCACTGAACTTAAAGCTTCCTTTGCTGT^ 

F FNLTOLKASFAVQGLNT L 



D 61 



III 

TTTAGTTACACTCTCAG GTGGTCATACGTTTGGAAGAGCTCGGTGC^GTACATTCATA.^A o'-0 

Y T l ? G g H T Z GRARCoTF IN ^81 

heme-binding domain 

CCGATTATACAACTTCAGCAACACTGGAAACCCTGATCCAAC 680 

Rlynfsntgmpdptlntty :. 2-31 

AGAAGTATTGCGTGCAAGATGCCCCCAGAATGCAACTGGGGATAACCTCACCAATTTGGA "> 4 c 

EVLRAP. CPQNATGDNLTNLD 221 

\CACCTGATCAATTTGACAACAOATACTACTCCAATCTTCTG CAGCTCAATGG 
TFOQFDNRYYSNLLQLNG 



CCTGAGC 
L S 



AATGGGTAATATTGGAGTGCTGACTGGGGATGAAGGAGAAATTCG CTTGCAATGTAATTT 
MGMIGVLTGOBGElRLQCNr 

TGTGAATGGAGACTCGTTTGGATTAGCTAGTGTGGCGTCCAAAGATGCrAAACAAAAGCr 
VNGDSFGLASVASKDAKQKL 

TG^G^C^TCTA^T^CCAATAATTAATGGGGATGTGCATGCT^ 



800 
241 



CTTACrrCAGAGTCUCCAACy^CTTTrCTCCACTCCTC^TG fl S 0 

LLQSDOELKSTPGADTI P i" V 261 

< prx6 - - - 

CAATAGCTTCAGCAGTAACCAGAATACTTTCTTTTCCAACTTTAGAGTTTCAATGATAAA 9 "> o 

NSFS SHQNTFFSNFRVSMI K 261 



980 
P 301 



1040 

Q X L 321 

11C0 
326 



C. . Stra:hy £ Hc-idenon 
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A JG TAAATTAGGTTGTAAACCTCTTTG CTAGCTATATTGAAATAAACCAAAGGAGTAGTG 116 0 

TGCATO T Z AATTCG ATTTTGCCATGTACCTCTTGG AAT ATTATGT AAT AATT ATTTGAAT 12 2 0 
CTCTTTAAGGTATTTAATTAATC (A) n 



Cowling, Strcthy & Henderson 
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Figure 2a 

•jlfs** ATGGGTTtCATGCGT - CTA7T AGTAGTGG CATTGTTG ) 6 

tltl lt GCTCT7CAAAACAATGAACTCC T7AGCAACTT - CTATGTGG 40 

xlUll :;;;^;";- CTCC ttaccaactt-ctatg-gg 22 

X3 ° 69 ' AATGCTTGGT CTAAGTGCAACAGCTTTTTGCTGTATGG 33 

U41657 T^T" . II " G ^yyT-GCTATGCATGCAGGTTTTTCAG7 CTCTTATGC 7? 

rlntl 3 TGTATTGTG GTTGTGCTTC^GGGTTACCCTTCTC^CAAATGC 8 9 

X9069, TGTGTTGTGCTTTTAGTTGTGCTTGGAGGACTACCCTTTrCCTCAGATGC 90 

X 06 H "J 8 ™""™ 7 9 S 

0632 T ~ T - TTGTGCT AA TGGAGGAGTACCCTTTT - - - CAAATGC 7 5 

T^^5TT*5T55 y A 5 G TT^' A ^" '"■'^^^'^^^^caaatctgttcccta 127 

X 9 0 6 9 3 G ^AACTTGATCCATCCTTTTACAGGAA^ACTTG^'^CA^TC"^ar"-^i-(~ a , -. Z 

^ 3 6 1 5 6 ACAACTTAGTCCCACTTrrTACAGCAAAACGTGTCCAA^^ftr^r^ , ^ 

X90692 ACAACTAGATCCTTCATTTrACAAC^S^S^cTfS^cS "2 



L78163 

U41657 - T ~ VYII *v.n i^LUARLCCGAATCGGG 177 



4 4 



GTGTTTGGAGTAATCTrCGATGCTTCTTTCACCGATCCCCGA^ 



L73163 GCCAGTCTCATGAGGCTTCATTTT'lATGAT/TGCTTTGTTCAAGGTTGTGA 2^1 

L351S6 GCTAGT^St^™^~^ T ^ CTGT ^ G ^ CTG< ^ A TGTGA 240 

»..» --^SSSI^SL^S^^i III 

X906 94 TGC^tSg^GC^S^^^ 

Mill! ^^Sr^s & 

X 906 92 CACCACCAAATJU^^^™ 3 2 2 

325 



272 
275 



X90694 iSEKSKKS^^ 33* 

-3 6156 AAAAC-^CTGTAGAA^toSgTC™^^ G ^^ GTG ^^ 7AT 3 ?0 



wis" aagacagcggtggaa^tagtt^tc^^^^^^ gtgctoatat 377 

X90693 '^^^G^CTTTCTCCTGAATTAT^^^^TA^^"^T G ^?^^^^^'' , ^ * AGGA 228 

X90S94 TCTTGCACTTGC^CtSa^T^c^ " ^^ ArGGTCC7 ^ C «> 7 

I.36;.5« TCTTGCACTTGCT - Sc-C? " £T™ GAAGGTCCTAGT «> 9 

X.0.,2 r^CT^TCT^SJS?^:^^^ - 



Cowling, StrQihy & Henderson 
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Figure 2A 

ATGGGTTCCATGCGT - CTATTAGTAGTGGCATTGTTG 



X 9 0 6 s 3 G GCAAA - CAATGAACTCCCTTCGTGCTGTAGCAAT AG • CTTTGTGC 4 4 

X906 94 GCTCTTCAAAACAATGAACTCC - - TTAGCAACTT - CTATGTGG 40 

L36 156 --CTCC- TTAGCAACTT - CTATGTGG 22 

y 90692 AATGCTTGGT CTAAGTGCAACAGCTTTTTGcTTGTATGG 3 8 

L78I6 3 :*GT GCATTT -GCTATGCATGCAGGTTTTTCAGT - - - CT CTTATGC 7? 

U41657 - 0 

X 9 0 6 9 3 TGTATTGTG GTTGTGCTTGGAGGGTTACCC TTCT CTTCAAATGC 8 8 

X 9 0 6 9 4 TGTGT rGTGCTTTTAGTTGTGCTTGGAGGACTACCCrTTTCCTCAGATGC 9 0 

L 3 6 1 S 6 TGTGTTG rGCTTTTAGTTGTGCTTGGAGGACT ACCCTTTTC CTCAGATGC 7 2 

X90692 TGT - TT3TGCTAAT TGGAGGAGTACCCTTTT - - -CAAATGC 75 

L7 8 1 6 3 TCAGCTTACTCCTACGTTCTACAGAGAAACATGTCCAAATCTGTTCCCTA 12 7 

U41657 " 0 

X 9 C 6 9 3 GCAACITGATCCATC CnvriA CAGGAACACTTGTCCAAATGTTAGTTCCA x 3 8 

X 9 06 94 ACAACTTAGTCCCA CITTI I ACAGCAAAACGTGTCCAACTGTT AC ""CCA 14 0 

L3 6156 ACAACTTAGTCCCACTTTTTACAGCAAAACGTGTCCAACTGTTA .CCA 12 2 

X 5 0 6 9 2 ACAACTAG ATCCTTCATTTTACAACAGTACATGTTCT AATCTTGA TTCAA 12 5 

L 7 3 1 6 3 TTGTGTTTGGAGTAATCTTCGATGCTTCTTT CACCGATCCCCGAATCGGG 17 7 

U41657 - 0 

X 9 0 6 9 3 TTGTTCGTGAAGT»TAAGGAGTGTTTCTAAGAAAGATCCTCGTATGCTT 18 3 

X 9 0 6 9 4 TTGTTAGCAATGTCTTAACAAACGTTTCTAAGACAGATCCTCGCATGCTT 19 0 

L 3 6 1 5 6 TTGTTAGCAATGTCTTAACAAACGTTTCTAAGACAGATCCTCG CATG CTT 17 2 

X 3 0 6 9 2 TC3TACGTGGTGTGCTCACAAATGTTTCACAATCTGATCCCAGAATG CTT 1 " 5 

L 7 8 1 fa 3 GCCAGTCTCATGAGGCTTCATTTTCATGATTGCTTTGTTCAAGGTTGTGA 2 2 7 

U4 16 57 - -TTTCATGATTGCTTTGTTCAAGGTTGTGA 2 9 

X 906 93 GCTAGTClTGTCAGGCTTCACTTTCATGACTGTTTTGTTCAAGGTi'GTGA 23 8 

X 9 0 6 94 GCTAGTCTCGTCAGGCTTCACTTTCATG ACTGTTTTGTTCTGGGATGTGA 2 4 0 

L 3 6 . 5 6 GCr A GTCTCGTCAGGCTTCACTTTCATG ACTGTTTTGTTCTGGGATGTGA 2 2 2 

X 9 0 6 9 2 GGT AGTCTCATCAGGCTACATTTTCATGACTGTTTTGTTCAAGGTTGCGA 2 2 5 



L 7 8 1 - 3 . TGGATCAGTTTTGCTGAACAACACTGATACAAT AGAAAGCGAGCAAGATG 2 7 7 

U4 1 6 5 n TGG ATCAGTTTT ACTGAACAACACTG ATA CAATAGA AAGCG AGCAAG ATG 7 9 

X 9 0 6 9 3 TGCATCAGTTTTACTAAACAAAACTGATACCGTTGTGAGTGAACAA3ATG 2 8 8 

X 9 0 6 9 4 TGCCTCAGTTTTGCTGAACAATACTGCTACA-\TCGTAAGCGAACAACAAG 2 9 0 

L3 6 156 TGCCTCAG ITIT GCTGAACAATACTGCTACAATCGTAAGCGAACAACAAG 272 

X 906 9 2 TGCCTCGATTTTGCTGAACGATACGGOTACAATAGTGAGCGAGCAAAGTG 2 75 



L-7 8 1 5 3 CACTTCCAAATATCAACTCAATAAGAGGATTGGACGT^fGTGAATGACATC 3 2 7 

U 4 1 6 5 7 CACTTCCAAATATCAACTCAATAAG/.GGATTGGACGTTGTCAATGACATC 12 9 

X 9 06 9 3 CTTTTCCAAACAGAAACTCATTAAGAGG7^rTG<^TGTTGTGAATCAAATC 3 3 8 

X 9 (. 6 S 4 CTTTTCCAAATAACAACTCTCTAAGAGGTTTGGATGTTGTGAATCAGATC 3 4 0 

L3 6 1 56 CTTTTCCAAATAACiUCTCTCTAAGGGGTTTGGATGTTGTGAATCAGATC 3 2 2 

X 3 0 6 9 2 CACC\CCAAATAACAACTCCATAAGAGGTTTGGATGTGATAAACCAGATC 3 2 5 



L 7 8 1 6 3 AAGACAGCGGTGGAAAATAG TTGT C CAG A.CACAGTTT CTTGTG CTGAT AT 3 7 7 

*J4 1 6 5 7 AAGACAGCGGTGGAAAATAG TTGT CCAGACAC-\GTTTCTTGTGCTG AT AT 17 9 

X 906 9 3 AAAACAGCTGTGGAAAAGGCTTGT rCTAACACAGTTTCTTGTGCTGATAT 3 88 

X90 6 94 AAACTGGCTGTAGAAGTGCCTTGTCCTAACACAG 1 1 iLl 1 GTGCTGATAT 3 90 

1-36 156 AAAACTGCTGTAGAAAGTGCTTGTCCTAACACAGTTTCTTGTGCTGATAT 372 

X 906 92 AAAACAGCGGTGGAAAATCCTTGTCCTAACACAGTTTC-:-rGTGCTGATAT . 3 75 



1-78163 TtrTTGCTATTGCAGCTGAAATAGCTTCTGTT - CTGGCAGGAGGTC CAGCA 426 

LT4 1 6 5 7 7 :*TTGCT ATTGCAGCTGAAA TAG C TTCTGTTGC TGGGAGG AGGT C - AGGA 2 2 e 

X90693 T;:TTGCTCTTTCTGCTaAATTATCATCTACA-CTGGCA-7 TXTTCCTGAC 437 

X906 9 4 ■•:*TTGCACTTGCTGCTCAAGCATCCriCTGTT-CT^ ' A AGGTCCTAGT 439 

1-36 156 TCTTGCA CTTG CT CAAGCATCCT . ".TT- CTw^ACAAGGTCCTAGT 415 

X 906 9 2 "TTGCTCTTTCTGCTGAAATATCATC^-'tAT-CTGOCAAATGGTCCTACT 4 24 
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U41657 ATAATTATTTGAATCTC AAAAAAAAAAAAAAAA 1031 

X90693 AAAATCTTTTGGATTTC ATTTGAAGTGTTTCT - - - - 1200 

Y 9 Of? 94 x^yju 

L36156 TGT-TCTT C - - t TTGGTATTATACT A - - T 1200 

X906 92 GGGA-CTGTAGAAGCTCCCTAATAATATTTGTGTCAAAGT 12 00 
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Figure 2B 

:.7816 3 KGSKRLLWALLCAFAMHAGFSVSY AQLTPTFYRETCP!*LFPIVFGV 47 

'J41657 - 0 

X90693 MNS LRA V A I ALC C I V - - WLGGLP FS SN AQ LD P S FYRKTC PNVS S I VR EV 48 

X90694 MNSL- - -ATSMWCV^L WLGGLP FSSDAQLSPT FYS KTCPTVSS I VSW 47 

136156 M WCT/VLLVVLGGLPFSSDAQLSPTFYSKTCPTVSS I VSNV 4 0 

X 9 0 6 9 2 MLGLS ATA FCCMVTVL IGGVPFS - NAQLDPS FYNS TCSNLDS IVRGV 4 6 

L7 8 1 6 3 I FT) AS FTDPRIGASLMRLHF'HDCFVQGCDGS VLLNNTDT I ESEQDALPNI 9 7 

U4 16 57 --FHDCFVQGCDGSVLLNNTDTIESEQDALPNI 3 I 

X906 93 I RSVS KKD PRMLAS LVRLHFHDCFVQGCDAS VLLNKTDTWS EQDAF? 98 

X 9 0 6 9 4 LTNVS KTD P RMLAS LVRUf FHDCFVU5CXIAS VLLNNTAT I VS EQQAFPNN 9 7 

L 3 6 1 5 6 LTNVS KTD PRMLAS LVRLHFHDCFVLGCDAS VLLNNTATI VSEQQAFPNN 9 0 

X906S2 LTNVSQSDPRMLGSLIRI-HFHDCFVQGCDASILLNDTATIVSEOSAPPNTJ 96 



L78163 NSI RGLDWND I KTAVENS CPDTVS CAD I LA I AAE I AS VLGGG PGWPVPL 14 7 

U4 1657 NSIRGLDVa'NDIKTAVENSCPOTVSCADILAIAAEIASVAGRRSGWPVPL 8 1 

X906 93 N5UlGLDV\^QIKTA\^KACPNTVSOUDIlJU-SAELSSTLADGPDW^'PL 14 8 

X90694 NS^GUDV\^QIKLAVTTVTCPNTV 14 7 

L36 156 NSI^GU)VVNQIKTAVESACPNTVSCADILAJ^-QASSV^^ 13 9 

X906 92 NS I RG LD V I NQ I KT A VENAC PNTVS CAD I LALS AE I S S DLANG PTWC VP L 146 



L7 8 1 6 3 G RRD S LT ANRTLANQNL PAP FFNLTQ LKAS FAVQGLNTLDLVTLSGGHTF 19 7 

U4 1 6 5 7 GRRDSLTANRTLANQNL PAPFFNLTQLKASFAVQGLNTLDLVTLSGGHTS 131 

X906 9 3 GRRDGLTAWQLLANQNLPAPFNTTDQIJ<AAFAAQGI^TTDLVALSGAH 198 

X906 94 GRRDGLTANRTXJVNQm,PAPFNSl^tJCAAFTAOGL^^n*DLVALSGAHTF 197 

L3 6 1 56 GRRDGLTANRTLANQNLPAPFNSLDHLKLHLTAQGLITPVLVALSGAHTF 1 8 9 

X906 92 GRRDS LTANNS LAAQNLPAPTFNLTRJLJCShTFDNQNLSTTDLVALSGGHT I 196 
******** **••***» ^ ^ • * * ******* 

L7 9 16 3 GRAJ^CSTFIKRLYNFStTTGNPDPTLKTTYLEVLRARCPQ 2 4 7 

U4 1 6 5 7 GRARCSTF INRLYNFSNTGLIH- - U>TTYLEVLRARCPQNATGDMLTNLD 17 9 

X906 9 3 GRAHCSLFVSRLYNFSGTGSPDPTLNTTYUJOLRTICPNGGPGTNLTNFD 24 8 

X 9 0 6 9 4 GRAHCAQ FVS RLYNFSS TGS PDPTLNTTYLQQLRT I C PNGG PGTNLTN FD 2 4 7 

L36 1 56 GRAHCAQ FVS RLVTiFSS TGS PDPTLOTTTYLQQLRT I C PNGG PGTNLTNFD 2 3 9 

X 9 0 5 9 2 GRGOCRFFTORLYNFSrrrGNPDSTLNTTrLQTLQAI CPNGGPGTNLTDLD 2 4 6 
**..* * ****** ** * * * * * * •* * • ■ * * 

^78163 LSTPDQFDNRYYSNLLQLNGLLQSDQELFSTPGADTI P I VNSFSSNQNTF 297 

U4 16 5 7 LSTPDQFDNRYYSNLLQLNGLLQSDQERFSTPGADTI PLS I A- SANQNTF 2 2 8 

X906 93 PTTPDKFDKNYYSbO-QVKKGIXOSDQELFSTSGSDTISIVNKFATDQKAF 2 98 

X 90 6 94 PTTPDKFDKNYYSNLQVKKGLLQSDQELFSTSGADT I S I VNKFSTDQNAF 2 9" 

L36 1 56 PTTPDKFDKNYYSNLQVKKGLLOSDQELFSTSGADTIS IVDKFSTDQNAF 2 8 9 

X 906 92 PTTPDTFDSNYYSNLQVGKGLFQSDQELFSRNGSDTIS I VNSFANNQTLF 2 96 
.**♦**..♦**** ♦**♦#***. **** * * 

L7816 3 FSKFRVSMIKMGNIGVLTGDEGEIRLQCNFVN GDSFGLASVAS - K 34 1 

U41657 FSNFRVS M I KMGN I GVLTGDEOB I RLQCNFVN GDSFGLASVAS - K 272 

X90693 FESFRAAMIKMGNIGVLTGNQGBIRKQCNTVN- - - SKSAELGL INVA3 - A 344 

X906 94 FESFKAAMIKMGNIGVLTGTXGBIRKQCNTVNFVNSNSAELDLATIAS IV 34 7 

L36156 FESFKAAMIKMGIUGVLTGTKGEIRK3CNFVN- - - SNSAELDLATI AS I V 336 

X 906 92 FENFVASMIKMGNIGVLTGSOGEIRTQCNAVN GNSSGLATWT - K 3 4 0 
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L781S3 DAKQKLVAQSK 3 52 

TJ416 57 DAKQKLVAQSK 28 3 

X90693 DSSEEGMVSSM 35S 

X9C694 ESLEDGIASVI 358 

L361S6 ESLEDGIASVI 347 

X90692 ESSEDGMASSF 351 
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Figure 5 
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Figure 7 
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Figure 8 
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